# run setup script
#install.packages("remotes")
#library(remotes)
#remotes::install_github("clauswilke/dviz.supp")
#devtools::install_github("clauswilke/dviz.supp")
library(colorspace)
library(dplyr)
library(tidyverse)
library(ggforce)
library(ggridges)
library(treemapify)
library(forcats)
library(statebins)
library(sf)
library(cowplot)
options(digits = 3)
knitr::opts_chunk$set(
echo = FALSE,
message = FALSE,
warning = FALSE,
cache = FALSE,
#dpi = 105, # not sure why, but need to divide this by 2 to get 210 at 6in,
# which is 300 at 4.2in
fig.align = 'center',
fig.width = 6,
fig.asp = 0.618, # 1 / phi
fig.show = "hold"
)
options(dplyr.print_min = 6, dplyr.print_max = 6)
We first read in two data sets called “income” and “life” representing income and life expectancy values throughout a multitude of years. “Income” has 193 observations with 220 total variables whilst “Life” has 187 observations and 220 total variables. Next, we reshape both data sets such that there are only three columns (Geo, Year, Income or Life Expectancy). We then merge these two new sets into a data set called “LifeExpIncom” which now contains Geo, Year, Income, & Life Expectancy (40953 observations and 4 variables). We then read in two more sets called “country” (240 observations and 11 variables) and “pop” (195 observations and 220 variables) respectively representing country and population data. We reshape the data set “pop” so that it coincides with “LifeExpIncom” and “Country” which already have the variable Year transformed into one column. After doing this, we’re able to merge “LifeExpIncom” with “Country” and then this newly merged set with our recently transformed “pop” set, creating a set called “fin_data” (42705 observations and 15 variables). After this, all that is left is to subset the data so that we only focus on data from the year 2000. This gives us our “final_data” (195 observations and 15 variables) set:
geo year population life.expectancy
Length:195 Length:195 Min. :7.85e+02 Min. :44.1
Class :character Class :character 1st Qu.:1.26e+06 1st Qu.:61.2
Mode :character Mode :character Median :6.01e+06 Median :70.5
Mean :3.13e+07 Mean :67.3
3rd Qu.:1.90e+07 3rd Qu.:74.7
Max. :1.28e+09 Max. :81.8
NA's :8
income alpha.2 alpha.3 country.code
Min. : 529 Length:195 Length:195 Min. : 4
1st Qu.: 2335 Class :character Class :character 1st Qu.:209
Median : 6860 Mode :character Mode :character Median :418
Mean : 13667 Mean :425
3rd Qu.: 15700 3rd Qu.:643
Max. :108000 Max. :894
NA's :8 NA's :21
iso_3166.2 region sub.region intermediate.region
Length:195 Length:195 Length:195 Length:195
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
region.code sub.region.code intermediate.region.code
Min. : 2.0 Min. : 15 Min. : 5.0
1st Qu.: 2.0 1st Qu.: 54 1st Qu.:11.0
Median : 19.0 Median :154 Median :14.0
Mean : 71.7 Mean :178 Mean :14.9
3rd Qu.:142.0 3rd Qu.:202 3rd Qu.:17.0
Max. :150.0 Max. :419 Max. :29.0
NA's :21 NA's :21 NA's :119
The above scatter plot shows the relationship between income, life
expectancy, and population size across different regions in the year
2000. Each point is a country and the size of the points correlate with
the population size of that specific region. The countries are all color
coded as well.
From looking at the plot, we can see that there is very slightly positive correlation between income and life expectancy. We can also tell that countries that have higher incomes will most likely have longer life expectancy. Eyeballing the plot, we can see that countries in the Americas and Asia definitely contain the higher population sizes. This also shows that countries with a higher population will most likely have a longer life expectancy as well. Those in Europe seem to have the longest life expectancy, with most of its points on the far right side of the graph - although their populations aren’t as large as other countries.
After this, all that is left is to subset the data so that we only focus on data from the year 2015. This gives us our “final_data” (195 observations and 15 variables) set. Now, let’s look at the overall summary statistics for the data set “fin_data” which contains not just data from 2015, but from all years from the data set.
geo year population life.expectancy
Length:42705 Length:42705 Min. :6.42e+02 Min. : 1
Class :character Class :character 1st Qu.:2.83e+05 1st Qu.:31
Mode :character Mode :character Median :1.71e+06 Median :36
Mean :1.30e+07 Mean :43
3rd Qu.:5.94e+06 3rd Qu.:56
Max. :1.42e+09 Max. :84
NA's :2268
income alpha.2 alpha.3 country.code
Min. : 247 Length:42705 Length:42705 Min. : 4
1st Qu.: 875 Class :character Class :character 1st Qu.:208
Median : 1440 Mode :character Mode :character Median :418
Mean : 4591 Mean :425
3rd Qu.: 3460 3rd Qu.:643
Max. :178000 Max. :894
NA's :1752 NA's :4599
iso_3166.2 region sub.region intermediate.region
Length:42705 Length:42705 Length:42705 Length:42705
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
region.code sub.region.code intermediate.region.code
Min. : 2 Min. : 15 Min. : 5
1st Qu.: 2 1st Qu.: 54 1st Qu.:11
Median : 19 Median :154 Median :14
Mean : 72 Mean :178 Mean :15
3rd Qu.:142 3rd Qu.:202 3rd Qu.:17
Max. :150 Max. :419 Max. :29
NA's :4599 NA's :4599 NA's :26061
The above plot shows the relationship between income, life expectancy, and population size across different countries in the year 2015. Each point is a country and the size of the points correlate with the population size of that specific country. The countries are each color coded as well.
The x-axis looks at the income levels for each country. Countries that have higher incomes will be skewed to the right. The y-axis looks at life expectancy. Countries with higher life expectancy will be skewed higher on the y axis. From looking at the plot, we can see that there are some countries that primarily take over the scatter plot as opposed to others depending on population size and income. We can look at whether countries that have higher incomes generally have longer life expectancies or examine the population sizes to see if they correlate with higher or lower income levels.